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DETAILED ACTION 
Specification 

1 . The title of the invention is not descriptive. A new title is required that is clearly 
indicative of the invention to which the claims are directed, and the term "Rendering" is 
redundant because synthesizing an image, at least in the sense used by applicant, 
prima facie requires rendering the image. 

The following title is suggested: Geometry-Driven Feature Point-Based Image 
Synthesis. 

Drawings 

Examiner accepts the drawings. 

Claim Objections 

2. Claim 1 is objected to because the term "geometric component" is unclear. 
Applicant is required to clarify the meaning of this temiinology, in that it is unclear if the 
claim language intends to recite polygons or higher-order components, since the term 
'geometric' generally in the art implies linearity, and most of the modeling techniques in 
computer graphics are non-linear (e.g. NURBs, superquadratics, Bezier splines or 
curves, et cetera). 

3. Claim 30 is objected to because it is improperly dependent, in that it states that is 
dependent upon claim 24, where that is a dependent claim not having the limitations 
recited in claim 30. Claim 30 should be dependent upon claim 25. Examiner will treat 
this claim (for purposes of examination in this action) as being dependent upon claim 
25; applicant is required to correct this deficiency. 
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4. Claims 25-34 are objected to because they utilize the term "expression" where 
the term does not have clear meaning associated with it. In claims 35-43, the term 
"facial expression" is used, where it is known that faces have expressions. However, 
the term 'expression' in general English and the art usually is taken to mean 'something 
that communicates' or 'a facial aspect or look that conveys a feeling' or *the outward 
manifestation of a feeling or mood'. Therefore, the term as used in claims 25-34 is 
unclear and vague. Applicant is required to clarify the meaning of this term and/or 
amend the claim to use language that clearly expresses the intent of the invention. 

Claim Rejections - 35 USC § 103 

5. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

6. Claims 1-14 and 24 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Cosatto et al (E. Cosatto and H. Graf, "Photorealistic Talking-Heads from Image 
Samples")(Reference provided by applicant on IDS)('Cosatto'). 

7. As to claim 1 , 

A computer implemented method for rendering a synthesized image, comprising: 
-Generating a geometric component corresponding to a selected image based on 
identified feature points from a set of representative images having the identified feature 
points; and (Cosatto teaches in section 1, page 152, thiat in the first step, image 
samples of facial parts are generated and results in a database of facial parts. Pages 
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153-154, section III, teach the methods of how this is done, and how the hierarchy of 
parts and samples are obtained and subsequently ordered. Section IV on page 154 
states that the first step in the process is measuring the face to determine the location of 
certain facial points (e.g. the recited feature points), which correspond to the "identified 
feature points" above. The "set of representative images" is the video recorded in 
section I; all faces would have the same general set of features, e.g. eyes, nose, et 
cetera. The system of Cosatto clearly synthesizes a geometric component, e.g. 
synthetic video, with specific emphasis on for example the mouth, section V-B (pages 
159-161) with other facial parts discussed in section V-D (page 161), which clearly 
constitutes "generating a geometric component", and the selected image is simply one 
frame of video wherein the synthesized face is saying something (e.g. see section V-B). 
-Generating the selected image from a composite of the set of representative images 
based on the geometric component. (Clearly, the system of Cosatto generates selected 
images (as set forth above) from a set of representative images (e.g. see Fig. 6, page 
160, where "from a phonetic transcript, parameters of the mouth are calculated ... to 
obtain an animation script". The "phonetic transcript" is a video clip of an individual 
saying a phoneme as set forth in sections V-B and V-C on pgs 159-161 . Clearly, the 
images are sampled for the mouth, which would be for example the recited "geometric 
component". Finally, video - and the image database as recited in section III, pages 
1 53-1 54 - prima facie teach the use of a "composite" of a set of representative 
images.") 
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As stated and detailed above, the Cosatto reference teaches all the limitations of 
the above claim. The system of Cosatto performs a specific application - processing of 
video to generate a database of image components that are then analyzed and used to 
synthesize output images based on this database and the pre-recorded video. 

8. As to claim 2, Cosatto clearly states that certain feature points are measured 
initially - see page 154, where in Fig. 1 and section IV-A, the measurement of a few 
feature points are done. This allows the system to track only those points in video, 
where however each of those points is recalculated per frame; these points clearly 
constitute the recited "plurality of values" with at least one value associated with at least 
each representative image. Generating the recited synthesized image (as taught on 
pages 159-161), especially for the mouth region, clearly requires using the plurality of 
values to composite the video frames to generate the image database and to composite 
the synthesized image so that it has photorealism. 

9. As to claim 3, the claim requires that the synthesized image and the 
representative or source images be a plurality of subregions that are near each other. 
Clearly a face consists of regions that are proximate to each other, as shown in Cosatto 
in Fig. 1 for example, and discussed in detail in the design of the database for facial 
parts (page 154, sections lll-A-^lll-D). Therefore, as the images are analyzed and put 
into the database, and the geometric components (e.g. the lips and mouth cited in 
section V-B) generated, and the final output images synthesized (section V in its 
entirety, sections V-B through V-D, pages 159-161, as well as Figure 8) all of these use 
portions of the face subdivided into regions or quads. 
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1 0. As to claim 4, Cosatto on pg. 1 61 , section V-E, clearly states that the base 
bitmap of the face is put into a frame buffer, then the bitmaps of the facial parts are 
projected onto it, and finally that such bitmaps are blended ("gradual blending or 
"feathering" masks"). 

11. As to claim 5, Cosatto teaches it for the same reason as above, namely that on 
page 161, on the left side, immediately below Fig. 8, it teaches that regions are blended 
and further that these are facial regions - obviously, regions of the face (except perhaps 
the mouth) will have common texture - in that they will all be covered in skin, and thusly 
alpha blending is used as set forth there, so those boundaries would not have 
discontinuities in texture. 

12. As to claim 6, that would be a trivially obvious variant, given that the system of 
Cosatto is intended primarily for synthesizing the mouth region to create natural- 
appearing speech, there would obviously be more samples of the mouth region than of 
other regions, particularly in the database of parts, as that is derived from all the video- 
recorded phonemes. Therefore, either Cosatto implicitly teaches it or it is a trivially 
obvious variant, and it would be obvious to modify for the reasons set forth immediately 
above, and on page 59, section H, it is stated the mouth database is larger than those 
of other features and an absolute size provided. 

13. As to claim 7, Cosatto on page 155 teaches specific items wherein section ll-A 
teaches that the 3-D head pose can be recovered from 2-D images and at the bottom 
right of the page it states that 2-D head models provide acceptable ranges for size. 
Further, the system uses video cameras to capture images for analysis purposes, and it 
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is prima facie obvious that the feature points are found in 2-D images as recited in the 
upper left section of the page, e.g. section A. 

14. As to claim 8, it is a duplicate of claim 3 and the rejection to claim 3 is 
incorporated herein by reference in its entirety. 

15. As to claim 9, it is a duplicate of claim 4; see that rejection. 

16. As to claim 10, this is a trivially obvious variant of claim 7, wherein Cosatto in 
section 2 (pages 152-154, emphasis on page 153) states that three-dimensional images 
using 3-D scanners are common in the art and in prior work. As further discussed in 
section IV-A on page 154. feature points on the face are measured in 3-D. Therefore, it 
would be obvious that the feature points could be on a three-dimensional image and it 
would be obvious to modify the system of Cosatto to use three-dimensional images for 
the reasons set forth above. 

1 7. As to claim 1 1 , it is a duplicate of claim 3 and the rejection to that claim is herein 
incorporated by reference - only the one reference is utilized. 

18. As to claim 12, it is a duplicate of claim 4; see that rejection. 

1 9. As to claim 1 3, Cosatto teaches in section A on page 1 55 that "Knowing the 
position of a few points in the face allows to recover the 3-D head pose from 2-D 
images", where this clearly justifies that examiner's contention that the a few key feature 
points are used to extract the position of other feature points, see for example section 
Vl-D on page 157. Section V-B on pages 159-160 clearly teaches how knowledge of a 
few points allows synthesis of a great many essential feature points on the mouth, 
which is the key feature. 
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20. As to claim 14, Cosatto teaches that obviously feature points are grouped in sets 
by different regions of the face - see page 154, sections III-1 through III-4 and Fig. 1 or 
of the synthesized image - see page 1 61 , sections D and E. Finding the position of one 
feature point on for example the mouth (see section V-B and V-C, particularly page 1 60) 
allows the calculation of the shift in other portions of the images, e.g. where the 
changes in position between one frame and another of the synthesized image are 
minimized to get more natural appearance (e.g. Figure 7 on page 160), which prima 
facie tracks change in position of feature points within the mouth region so as to be able 
to calculate the path that involves the least change in position for Viterbi optimization, 
and the details on feature point location and tracking are found in sections lll-D and III- 
E, particularly section lll-D. Thusly, Cosatto teaches all the limitations. 

21 . As to claim 24, on page 1 53 Cosatto teaches that in the left hand column that 
prior work in the field has dealt with aligning images to a reference image; thusly it 
would be obvious that the system of Cosatto could perfonn the recited limitations by 
aligning the generated facial image with a reference image, since that would allow 
deviations due to motion from corrupting the base image that the system was working 
with, given that the system of Cosatto puts the base shape into the frame buffer and 
then projects the other facial regions onto it, with the use of a reference image being an 
obvious variant on the "base image" require to do so - this is discussed in the rejections 
to the claims above, and motivation is taken from claim 1 above. 
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22. Claims 1 5-23 and 25-43 are rejected under 35 U.S.C. 1 03(a) as being 
unpatentable over Cosatto as applied to claim 14 above, and further in view of Chai et 
al (Chai et al. "Vision-based control of 3D animation".) 

23. As to claim 1 5, Cosatto does not expressly teach the limitation of using PCA to 
track position. Chai teaches the use of PCA on pages 200-201 for example with 
emphasis on sections 4.2 and 4.3, where it is taught that using PCA, the motion frames 
are broken down into linear subspaces and motion is tracked in that way. On page 200, 
section 4.3 it clearly discloses that a database of motion is kept, which would be similar 
to the database of images in Cosatto. The database of motion would be with respect to 
each linear subspace, which obviously could be the different facial regions of Cosatto - 
that is, the positional changes in motion of the images in the database of Cosatto for 
facial regions could be found using the PCA techniques of Chai. Therefore, It would 
have been obvious to one having ordinary skill in the art at the time the invention was 
made to combine the PCA of Chai with the motion tracking and splitting of the face into 
different regions of Cosatto for the reasons set forth above, as using PCA allows faster 
computation times for motion detection and improves temporal coherency (pg. 200- 
section 4.4 for example). 

24. As to claim 1 6, it would have been obvious that given that Cosatto tracked 
motion using overall feature points on the face (e.g. section lll-A page 155 or section D 
page 161) and that Cosatto also tracked feature points within the mouth subset in order 
to assure more natural appearing features as the difference between each pose was 
minimized via Viterbi optimization on page 160, Figure 7. Obviously, overall changes in 
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head position would tracked via the main feature points and determining the necessarily 
positional changes in the mouth (besides those necessitation by nomial motion of 
talking) would be based on the positional changes in the larger set of feature points on 
the face itself, e.g. any necessary translational or rotational movement of the overall 
head for example. Since only the primary reference is utilized, no separate motivation 
or combination is required and that from the rejection to the parent claim is herein 
incorporated by reference. 

25. As to claim 1 7, the system of Cosatto has a hierarchical database structure of 
feature parts, see for example section III, page 154, items 1-3, particularly Item I, titled 
"Hierarchy of Parts." Since only the primary reference is utilized, no separate motivation 
or combination is required and that from the rejection to the parent claim is herein 
incorporated by reference. 

26. As to claim 1 8, the system of Cosatto does not expressly teach this limitation, 
insofar as it teaches tracking feature points of the user when the data for the initial sets 
is recorded, but it does not expressly perfomi the recited details, although it does 
monitor feature points of a user. The system of Chai performs the recited limitations, in 
that it consists of a video camera that monitors the face of a user and generates an 
image of an avatar making similar facial movements, see for example Fig. 1 on page 
193, the caption specifies that users act out the motion in front of a single-view camera, 
and that the avatars have controlled facial movements similar to those of the user with 
texture mapped models (see section 1 , left side of page 194, and Figure 2 on page 195, 
and the captions on it). The system of Chai further tracks feature points of the user 
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(section 2.1 , page 196) on tfie face and moves the avatar as the user moves (see 
section 1.2 on page 196, where motion data and head motion are separated from facial 
defonnations and then both are applied to the avatar in separate passes). Obviously, 
the generated avatars of Chai (Figs. 1 and 2 for example) have separate components of 
the face, or it would be obvious to use the separate components of Cosatto for the face, 
and to utilize the motion tracking and facial deformation techniques of Chai described 
above. It would have been obvious to one having ordinary skill in the art at the time the 
invention was made to combine the systems of Cosatto and Chai, since Chai would 
allow any user to control the facial expressions of an avatar in addition to overlaying 
audio text and simulating real speech - the facial techniques would allow better 
synchronization of voice and facial movements in for example the avatars, and would 
allow even an unskilled user to adequately control facial motions (see section 1 , pages 
193-194). 

27. As to claim 19, Cosatto teaches in section A on page 1 55 that "Knowing the 
position of a few points in the face allows to recover the 3-D head pose from 2-D 
images", where this clearly justifies that examiner's contention that the a few key feature 
- points are used to extract the position of other feature points, see for example section 
Vl-D on page 157. Section V-B on pages 159-160 clearly teaches how knowledge of a 
few points allows synthesis of a great many essential feature points on the mouth, 
which is the key feature. Since only the primary reference is utilized, no separate 
motivation or combination is required and that from the rejection to the parent claim is 
herein incorporated by reference. 
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28. As to claim 20, reference Cdsatto does not expressly teach this limitation, insofar 
as it does teach rendering an image of a speaking human being with the identified 
feature points on it (see for example Fig. 8, and facial locations are tracked by feature 
points as illustrated by Fig. 4, where the control points are noted. However, Chai 
teaches on page 196 in the "initialization" section that the user can select the control 
points, for which it would be an obvious modification to allow the user to control the 
movement of a feature point. Also, since the system of Chai (for example, see caption 
on Fig. 1 on the first page) teaches that the avatar moves in response to user facial and 
head movements, this also constitutes "receiving information indicative of a user moving 
a feature point". It would have been obvious to one having ordinary skill in the art at 
the time the invention was made to combine the systems of Cosatto and Chai, since 
Chai would allow any user to control the facial expressions of an avatar in addition to 
overlaying audio text and simulating real speech - the facial techniques would allow 
better synchronization of voice and facial movements in for example the avatars, and 
would allow even an unskilled user to adequately control facial motions (see section 1 , 
pages 193-194). 

29. As to claim 21 , this claim is a substantial duplicate of claim 16; the rejection to 
that claim is herein incorporated by reference in its entirety, along with motivation and 
combination. 

30. As to claims 22 and 23, Cosatto does not expressly teach this limitation, whilst 
Chai teaches in Fig. 1 on page 193 that the user can control or select the facial 
expression by making the desired expression on their own face, e.g. two separate facial 
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expressions are shown in the leftmost column, and in the rightmost column the avatars 
are shown depicting those facial expressions. Motivation and combination is 
incorporated by reference from claim 20 above. 

31 . As to claim 25, firstly, the rejection to claim 1 is herein incorporated by reference 
in its entirety. That rejection incorporates the limitations of having and accessing a 
database of various portions of a face and various captures of a full face with 
corresponding matching feature point. Secondly, the rejection to claim 14 is 
incorporated by reference in its entirety. That rejection teaches the limitation of 
determining a feature point from a change in position of another feature point based on 
a change in the selected feature point and the existence of a database (page 153) 
containing facial information (also, the hierarchy of that database is in page 154, section 
1 ). That is, Cosatto teaches those limitations and Chai teaches a motion database on 
page 194 on the left side of the page in section 1 and in the caption to Figure 2 on page 
195, Chai teaches that the motion database can be used to synthesize expressions. 
Obviously, displaying the avatars on the screen as shown in Figure 1 of Chai clearly 
constitutes rendering (see for example page 202, the reference to the terms "rendering 
cost"). Finally, it would be trivially obvious that since the user controls the motion and 
deformation of the avatar by means of controlling their own head position and facial 
expressions, and that Chai (e.g. section 1.2, page 196) clearly teaches separating head 
motion and facial expressions such that Chai's system would obviously render a new 
expression based on any change of feature points, with the choice of two being a 
trivially obvious variant. It would have been obvious to one having ordinary skill in the 
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art at the time the invention was made to combine the systems of Cosatto and Chai, 
since Chai would allow any user to control the facial expressions of an avatar in addition 
to overlaying audio text and simulating real speech - the facial techniques would allow 
better synchronization of voice and facial movements in for example the avatars, and 
would allow even an unskilled user to adequately control facial motions (see section 1, 
pages 193-194). 

32. As to claim 26, this claim is essentially a duplicate of claim 14, with the difference 
that Cosatto teaches that the feature points are grouped in sets according to the region 
of the face, e.g. the hierarchical database shown on page 154, and the rest of the 
limitations are taught in the rejection to claim 14, which is herein incorporated by 
reference in its entirety. Motivation and combination are taken from claim 25 above. 

33. As to claim 27, this claim is a substantial duplicate of claim 15, with the only 
difference that the database of representative images of Cosatto is substituted for the 
motion database of Chai. Chai teaches a motion database on page 194 on the left side 
of the page in section 1 and in the caption to Figure 2 on page 195, Chai teaches that 
the motion database can be used to synthesize expressions. The rest of the limitations 
are taught in the rejection to claim 15, which is herein incorporated by reference in its 
entirety; motivation and combination are taken from claim 25, which is the parent claim. 

34. As to claim 28, this claim is a substantial duplicate of claim 16, with that rejection 
herein incorporated by reference; motivation and combination is from claim 25 above. 

35. As to claim 29, this claim is a substantial duplicate of claim 17, with that rejection 
herein incorporated by reference; motivation and combination is from claim 25 above. 



Application/Control Number: 10/684,773 Page 15 

Art Unit: 2672 

36. As to claim 30, this claim is a substantial duplicate of claim 3, with the rejection to 
that claim incorporated by reference in its entirety. The only difference is that the 
database of representative images of Cosatto used in the rejection to claim 3 is 
replaced with the motion database for generating expressions of Chai for the reasons 
set forth in the rejections to claims 25 and particularly claim 27 above,, which rejections 
are also herein incorporated by reference. The motivation and combination is taken 
from claim 25. 

37. As to claim 31 , this claim is a substantial duplicate of claim 4, the rejection to 
which is incorporated herein by reference. Since only the primary reference is utilized, 
no separate motivation or combination is required and that from the rejection to the 
parent claim is herein incorporated by reference. 

38. As to claim 32, this claim is a substantial duplicate of claim 5, the rejection to 
which is incorporated herein by reference. Since only the primary reference is utilized, 
no separate motivation or combination is required and that from the rejection to the 
parent claim is herein incorporated by reference. 

39. As to claim 33, this claim is a substantial duplicate of claim 6, the rejection to 
which is incorporated herein by reference. Since only the primary reference is utilized, 
no separate motivation or combination is required and that from the rejection to the 
parent claim is herein incorporated by reference. 

40. As to claim 34, Cosatto does not expressly teach this limitation, whilst the system 
of Chai performs the recited limitations, in that it consists of a video camera that 
monitors the face of a user and generates an image of an avatar making similar facial 
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movements, see for example Fig. 1 on page 193, the caption specifies that users act 
out the motion in front of a single-view camera, and that the avatars have controlled 
facial movements similar to those of the user with texture mapped models (see section 
1 , left side of page 194, and Figure 2 on page 195, and the captions on it). Motivation 
and combination are taken from the parent claim, e.g. 25 and herein incorporated by 
reference. 

41 . As to claim 35, this claim is merely claim 25 with the limitations of claim 20 added 
to it. The rejections to both claims are herein incorporated by reference in their entirety, 
and the motivation is taken from claim 25. 

42. As to claim 36, this claim is a substantial duplicate of claim 26, with that rejection 
herein incorporated by reference; motivation and combination is from claim 25 above. 

43. As to claim 37, this claim is a substantial duplicate of claim 27, with that rejection 
herein incorporated by reference; motivation and combination is from claim 25 above. 

44. As to claim 38, this claim is a substantial duplicate of claim 28, with that rejection 
herein incorporated by reference; motivation and combination is from claim 25 above. 

45. As to claim 39, this claim is a substantial duplicate of claim 29, with that rejection 
herein incorporated by reference; motivation and combination is from claim 25 above. 

46. As to claim 40, this claim is a substantial duplicate of claim 30, with that rejection 
herein incorporated by reference; motivation and combination is from claim 25 above. 

47. As to claim 41 , this claim is a substantial duplicate of claim 31 , with that rejection 
herein incorporated by reference; motivation and combination is from claim 25 above. 
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48. As to claim 42, tills ciaim is a substantial duplicate of claim 32, with that rejection 
herein incorporated by reference; motivation and combination is from claim 25 above. 

49. As to claim 43, this claim is a substantial duplicate of claim 33, with that rejection 
herein incorporated by reference; motivation and combination is from claim 25 above. 
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